context window
- Asia > Singapore (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Asia > Indonesia > Bali (0.04)
- (6 more...)
- Information Technology (0.46)
- Government (0.46)
- Education (0.46)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Massachusetts (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- (2 more...)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Virginia > Arlington County (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (4 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Asia > Taiwan (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > Canada (0.04)
- Asia > China > Beijing > Beijing (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
Exploring Context Window of Large Language Models via Decomposed Positional Vectors
Transformer-based large language models (LLMs) typically have a limited context window, resulting in significant performance degradation when processing text beyond the length of the context window. Extensive studies have been proposed to extend the context window and achieve length extrapolation of LLMs, but there is still a lack of in-depth interpretation of these approaches. In this study, we explore the positional information within and beyond the context window for deciphering the underlying mechanism of LLMs. By using a mean-based decomposition method, we disentangle positional vectors from hidden states of LLMs and analyze their formation and effect on attention. Furthermore, when texts exceed the context window, we analyze the change of positional vectors in two settings, i.e., direct extrapolation and context window extension. Based on our findings, we design two training-free context window extension methods, positional vector replacement and attention window extension. Experimental results show that our methods can effectively extend the context window length.
On Memory: A comparison of memory mechanisms in world models
World models enable agents to plan within imagined environments by predicting future states conditioned on past observations and actions. However, their ability to plan over long horizons is limited by the effective memory span of the backbone architecture. This limitation leads to perceptual drift in long rollouts, hindering the model's capacity to perform loop closures within imagined trajectories. In this work, we investigate the effective memory span of transformer-based world models through an analysis of several memory augmentation mechanisms. We introduce a taxonomy that distinguishes between memory encoding and memory injection mechanisms, motivating their roles in extending the world model's memory through the lens of residual stream dynamics. Using a state recall evaluation task, we measure the memory recall of each mechanism and analyze their respective trade-offs. Our findings show that memory mechanisms improve the effective memory span in vision transformers and provide a path to completing loop closures within a world model's imagination.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Finetune-RAG: Fine-Tuning Language Models to Resist Hallucination in Retrieval-Augmented Generation
Lee, Zhan Peng, Lin, Andre, Tan, Calvin
Retrieval-Augmented Generation (RAG) has emerged as a powerful framework to improve factuality in large language models (LLMs) by grounding their outputs in retrieved documents. However, ensuring perfect retrieval of relevant information remains challenging, and when irrelevant content is passed downstream to an LLM, it can lead to hallucinations. In this work, we propose Finetune-RAG, a simple and effective fine-tuning approach that features the first-of-its-kind RAG training dataset constructed to mimic real-world imperfections. Experimental results show that Finetune-RAG improves factual accuracy by 21.2% over the base model. We also propose Bench-RAG, an LLM-as-a-judge evaluation pipeline that stress tests models under realistic imperfect retrieval scenarios. Our codebase and dataset are fully open sourced for community use.